Swift Regular Expression Matching

ثبت نشده
چکیده

Queries involving Regular Expressions (RegEx) have a wide range of applications including document stores, bioinformatics and information retrieval. However, efficiently executing RegEx queries over large datasets remains a challenging task. Data scans do not scale well with input size; however, existing techniques that avoid data scans — referred to as “black-box” approaches — offer little or no benefit over data scans for RegEx. The latter typically execute RegEx queries by decomposing the query along operators, computing intermediate results for individual sub-queries (using indexes and/or partial data scans) and combining the intermediate results along respective operators. We analyze the black-box approach and identify operators for which the execution time of the black-box approach can be far from optimal. We then propose Swift, a set of transformations over the original RegEx that allow avoiding the black-box approach for such operators. We implement Swift over several data structures (including suffix trees, suffix arrays, compressed indexes, etc.) and show that Swift achieves significant speedups over the black-box approach and over popular open-source data stores that support RegEx via data scans, sometimes by as much as two orders of magnitude.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Swift Regular Expression Matching

Queries involving Regular Expressions (RegEx) have a wide range of applications including document stores, bioinformatics and information retrieval. However, efficiently executing RegEx queries over large datasets remains a challenging task. Data scans do not scale well with input size; however, existing techniques that avoid data scans — referred to as “black-box” approaches — offer little or ...

متن کامل

Approximate Regular Expression Matching

We extend the de nition of Hamming and Levenshtein distance between two strings used in approximate string matching so that these two distances can be used also in approximate regular expression matching. Next, the methods of construction of nondeterministic nite automata for approximate regular expression matching considering both mentioned distances are presented.

متن کامل

Prefix-Free Regular-Expression Matching

We explore the regular-expression matching problem with respect to prefix-freeness of the pattern. We show that the prefix-free regular expression gives only linear number of matching substrings in the size of a given text. Based on this observation, we propose an efficient algorithm for the prefix-free regular-expression matching problem. Furthermore, we suggest an algorithm to determine wheth...

متن کامل

Prefix-free regular languages and pattern matching

We explore the regular-expression matching problem with respect to prefix-freeness of the pattern. We prove that a prefix-free regular expression gives only a linear number of matching substrings in the size of a given text. Based on this observation, we propose an efficient algorithm for the prefix-free regular-expression matching problem. Furthermore, we suggest an algorithm to determine whet...

متن کامل

Regular Expression Matching on Graphics Hardware for Intrusion Detection

The expressive power of regular expressions has been often exploited in network intrusion detection systems, virus scanners, and spam filtering applications. However, the flexible pattern matching functionality of regular expressions in these systems comes with significant overheads in terms of both memory and CPU cycles, since every byte of the inspected input needs to be processed and compare...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015